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Abstract 

The thrust of automation and robotics for space applica- 
tions has been proposed for increased productivity, improved 
reliability, increased flexibility, higher safety, and for the per- 
formance of automating time-consuming tasks, increasing 
productivity/performance of crew-accomplished tasks, and 
performing tasks beyond the capability of the crew. This 
paper provides a review of efforts currently in progress at the 
NASA/Johnson Space Center and at Rice University in the 
area of robotic vision. Both systems and algorithms are dis- 
cussed. The evolution of future vision/sensing is projected to 
include the fusion of multisensors ranging from microwave to 
optical with multimode capability to include position, attitude, 
recognition, and motion parameters. The key features of the 
overall system design will be small size and weight, fast sig- 
nal processing, robust algorithms, and accurate parameter 
determination. These aspects of vision/sensing will also be 
discussed in this paper. 

L Introduction 

The Space Station, a major goal of our spoace program 
over the next decade, has been planned as a multipurpose 
facility in which missions of long duration can be conducted 
and supported. These missions will include science and appli- 
cations, observation, technology development and demonstra- 
tion, commercial laboratories and production facilities, opera- 
tional activities such as servicing/maintenance, repair of satel- 
lites, support of unmanned platforms, assembly of large space 
systems, and as a transportation node for transfer to other 
orbits and planetary missions. An important technology area 
foreseen to increase productivity and enhance astronaut safety 
is automation and robotics (A&R). The use of A&R for the 
Space Station can be viewed in two major areas: 
teleoperated/robotic systems for servicing, maintenance, 
repairs, and assembly; and computerized systems to reduce 
the manpower requirements of planning, monitoring, diag- 
nosis, control, and fault recovery of systems/subsystems. In 
addition to increase in the productivity through autonomy, 
A&R will result in increased operational capability, and flexi- 
bility. Robotic operations for the Space Station will involve 
maintenance/repair of the entire structure including various 
subsystems, orbiter/satellite servicing, astronaut assistance, 
equipment transfer, docking and berthing, inspection, remote 
monitoring, rocket staging, telescience, and assembly of the 
Station and large structures. To aid the astronauts in various 


tasks and replace him/her from some activities, robots must 
perform beyond the current state of the art by responding to a 
high degree of environmental uncertainty and operational 
flexibility. In order to accommodate various performance 
goals in robots, design concepts have been proposed by many 
organizations [1,2, 3,4,5, 6]. One approach advanced by the 
NASA/Johnson Space Center (NASA/JSC) involves using the 
Shuttle Remote Manipulator System (RMS) for Space Station 
Assembly (Figure 1). The next step in the use of A&R would 
be the Space Station Mobile RMS (now known as Mobile Ser- 
vice Center (MSC)). With the MSC tasks such as the final 
Station assembly, Station/satellite maintenance and repair, and 
routine inspections could be accomplished. The Orbital 
Maneuvering Vehicle (OMV) could similarly be used for the 
retrieval and repair services. As a final step, robots could be 
made autonomous and free-flying for inspection, retrieval, and 
repair tasks. A simplified schematic which shows the func- 
tional elements of an automated system is presented in Fig- 
ure 2. One of the key elements of the system is sensing and 
perception. Its primary function is to provide information 
regarding the position of the object in its environment relative 
to the system’s effector. This function involves isolation, 
description, identification, location, and data transmission. In 
a broader context a class of object properties which include 
geometric, mechanical, material, optical, acoustic, electric, 
magnetic, radioactive, chemical, and weight may be needed. 

The type, volume, and precision/accuracy of the data needed 
will depend on the nature of the task to be accomplished. 

As the robotics era dawns in space, vision will provide 
the key sensory data needed for multifaceted intelligent opera- 
tions. In general the 3D scene/object description along with 
location, orientation, and motion parameters will be needed. 
Sensor complements may include both active and passive 
microwave and optical with multifunction capability. The 
fusion of the information from these sensors, to provide accu- 
rate parameters for robots, provides by far the greatest chal- 
lenge in vision. Furthermore, the compression, storage, and * 
transmission of the information associated with multisensor 
capability require novel algorithms and hardware for efficient 
operation. 

In this paper, the vision data requirements are discussed 
from the standpoint of various applications. A review of the 
advanced systems technology for space applications is pro- 
vided. The progress in the area of algorithm development fo 
parameter estimation is summarized. Future concepts in btf 
sensor and algorithm development are elaborated. 
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If. Vision Requirements 

The vision requirements for space robotics are character- 
ized by environmental factors and tasks that the robot has to 
perform. The natural space environment consists of intense 
light and dark periods. At a nominal Space Station altitude of 
270 nmi, the sunlight intensity will fluctuate between about 
sixty minutes of extreme brightness (13,000 footcandles) and 
thirty minutes of nearly darkness [7]. Furthermore, due to the 
absence of atmosphere, light is not diffused/scattered. Conse- 
quently, the unenhanced images have large contrast with 
intensity changes of the order of 10. The intense specular 
reflections combined with camera performance can cause 
bloom and Fraunhofer/Airy rings resulting in scene obscurity. 
Further complexity results from other objects, such as stars, 
moon, sun, Earth, and other satellites in the field of 
view(FOV). Object reflectivity can also pose problems for the 
vision systems. Most space systems are painted white or 
finished with smooth, specular materials to provide highly 
reflective materials. The ubiquity of white surfaces intensifies 
the problem of relying on photometric data for object 
identification/discrimination. A secondary source of concern 
affecting vision is the absence of gravity. For free flying and 
tethered objects there would be an increased number of posi- 
tions and orientations in which the objects may improve due 
to the lack of disturbances caused by aerodynamic and gravi- 
tational forces. 

Many tasks have been proposed for robotic operations in 
the Space Station era. The most significant tasks include 
assembly of space structures, maintenance and repair, inspec- 
tion, and aid/retrieving of astronauts during Extra- Vehicular 
Activity (EVA). Assembly can include mating structures, 
bolting, locking, and forming joints in the structure itself. For 
the Space Station assembly initially the Shuttle Remote Mani- 
pulator System (RMS) controlled by astronauts in the cabin oi 
EVA can be used. This can be followed in the later stages of 
assembly by the Space Station Mobile Service Center (MSC). 
As time proceeds, the assembly processes could involve Orbi- 
tal Maneuvering Vehicle (OMV) or a free flying robot with 
robotic arms and various sensors for vision (Figure 3). 

In the case of Space Station the maintenance and repair 
tasks can include structural damage, failure of systems and 
components, cleaning and storage of space debris, and 
environmental effects on the spacecraft Inspection is the first 
significant part of the maintenance and repair activity. The 
inspection can include low velocity encounters with solar 
array, thermal radiator, hulls, windows, TV cameras, heaters, 
fluid containers, aging composite materials, printers, record- 
ers, and door mechanisms. The unpredictable nature of 
maintenance and repair tasks creates a problem in the 
development of the capability/design of space robots. The 
vision capabilities must be adaptive/versatile to accommodate 
these uncertainties. A detailed comparison of data using com- 
puter stored scenes/identification parameters with data from 
distorted/damaged systems, structures, or components yields 
the failure or absence of the parts. Another crucial application 
of the space robot is the retrieving or aiding of EVA or Man 
Maneuvering Unit (MMU) seated astronaut This task can be 
thought of as a special one belonging to a class of retriever 
and repair of spacecraft/orbiting objects. The 
robots/manipulators (such as RMS, MSC) can be operated by 


humans using teleoperation. The commands and other vital 
data are transmitted using communications systems. In more 
advanced concepts, the humans act as supervisors, setting 
schedules, tasks, and evaluating the performance of the robots. 
When this interaction is neglegible, the robot can act auto- 
nomously. 

For a direct control teleoperator systems, the primary 
function of robotic vision is to provide information about the 
position of the object relative to the system’s effector. In a 
direct controlled teleoperator system, some subfunctions may 
be allocated to the humans. In particular, the object 
identification can be delegated to humans. In a goal-directed 
teleoperator system, reliability and flexibility has to be incor- 
porated so as to allow it to search, identify, and locate parts 
based on existing data base. In the absence of the vision sys- 
tem the required object identification and position data has to 
be entered manually. In general, for the autonomous robotic 
systems, three levels of information are needed. These levels 
pertain to the sccne/world in which the objects are located, the 
objects themselves, and the specific parts of the objects. In 
most tasks an envelope of paramenters can be preprogrammed 
into the system. For example, in docking and berthing appli- 
cations the robotic vision/sensing may be needed within a 
cone of thirty degrees to a distance of fifty meters. Beyond 
this spatial zone, a radar system may be used for 
tracking/monitoring the object motion. 

The levels of information depend on the application 
involved. As an example in the satellite servicing area, the 
vision/sensing system may have to provide the necessary 
information to guide a robot/astrobot to a particular area, say 
an antenna feed. The satellite could be rotating and translat- 
ing simultaneously. Furthermore, the antenna could be gim- 
baling with a certain motion. In this scenerio, the data would 
not only include a 3D dynamic description of the target/object 
but also its position and rotational parameters with respect to 
the satellite. To accomplish this, algorithms fire needed for 
the parameter estimation. The vision/sensing instrumentation 
in this case would not only involve fixed field of vjfw video 
systems, but laser/millimeter wave radars which pould be 
slaved to the antenna feed motion. Doppler signal processing 
at microwave or optical frequencies can be used to sense mov- 
ing parts within a scene. The implication on the 
vision/sensing systems is clear, several sensors are needed to 
complete the basic information needed for an autonomous 
robot Clearly, for space applications the size, speed, and 
weight parameters are of paramount importance. Autonomous 
robot performance depends crucially on the vision capabili- 
ties. In certain operations, humans can be surpassed by robots 
due to memory and vision. Robotic vision can allow more 
precise measurements and faster response during time-critical 
situations. Other limitations of humans include fatigue, a lim- 
ited spectrum, and inaccurate color and grey scale resolution. 
Robotic vision provides an opportunity to utilize active and 
passive sensors in microwave, and optical/infrared bands. 
Furthermore, polarization and lookangles can be optimized to 
accurately measure scene parameters of interest The vision 
extension to shadowed and occuluded regions is important in 
many applications. The illumination intensity variations 
along with shadows can be used to determine the relative 
motion between the camera and the object Structured multi- 
spectral illumination can be used to derive the 3-D description 
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of the target. Mathematical models coupled with real-time 
imagery/data can be used to derive the motion and shape 
parameters. Associated with the sensory data is the need for 
computer architectures which provide high-speed processing, 
parallel computations/algorithms, associative memories, and 
intelligence. The transfer and reduction of the sensor data 
requires an efficient communications subsystem [8]. 

Moving objects in space need to be recognized and 
defined in terms of their position, orientation, and velocities 
for proximity operations. Soft docking is an important opera- 
tion for many robotic activities. A detailed analysis of dock- 
ing for robotic manipulations shows many benefits can be 
entailed with accurate tracking sensor data. The perturbations 
of the target/object position and attitude can be minimized by 
using accurate measurements of distance/range and velocity. 
The relative velocity and maximum shuttle RMS tip force for 
the docking of Shuttle to Space Station were analyzed by Mis- 
sion Planning and Analysis Division of NASA/Johnson Space 
Center (JSC). The capture stopping distances and the relative 
velocities for various forces are shown in Figure 4. With the 
knowledge of very accurate relative velocity (0,1 ft/sec) the 
perturbation force can be minimized at a particular stopping 
distance. The same result holds for a robot docking with a 
satellite, etc. Based on these considerations a laser docking 
sensor is now under development at JSC to provide perfor- 
mance goals as stated in Table 1. This is an illustrative exam- 
ple of the type of vision data needed in addition to the 
imagery. The overall scope of the vision data needed for the 
proposed JSC EVA Retriever project is depicted in Figure 5. 
A multi-sensor vision system has been proposed for this appli- 
cation. To achieve independence of the sunlight and to 
enhance accuracy, a multiple structured illumination source 
with controllable intensity, wavelength, polarization, field of 
view, and angles of incidence can be incorporated to alleviate 
limitations in vision systems being 

proposed by many organizations [9,10,11]. In the develop- 
ment of the space vision systems cost effectiveness, speed, 
small size, lightweight, high reliability and flexibility, and 
ease of operation must be 

considered. 

III. Vision System Concepts/Technology 

The initial use of vision for the Space Station could be to 
provide feedback to the human operator of the robotic struc- 
ture. In the initial vision systems, NASA anticipates the use 
of stereo televisions for label/feature based object recognition 
[12]. Color and 3-dimensional imagery will be important to 
both telepresence and robotic vision. NASA’s television pro- 
gram from Apollo through the Space Shuttle programs has 
been one of high crew and ground participation and control. 
Several limitations of these earlier video systems have already 
surfaced. One notable one was during the Solar Max Satellite 
repair when the shadowed surface could not be approached, 
and the grappling of the Solar Max was severally restricted 
because of limitations with the manual camera light level con- 
trols. For robotic application, several features must be added 
to the presently available space television system. 
Specifically, predictive auto focusing, programmable predic- 
tive scene control with auto zoom, gamma, and iris, automated 
or voice controlled pan, tilt, pointing, and scene tracking capa- 
bility. Illumination affects the scene definition and is, 


therefore, critical to the robot’s performance. The lighting 
technique should involve a combination of artificial lights and 
natural sources (Figure 6). The parameters/performance for 
the artificial light should include programmable wavelength, 
polarization, intensity, and angle of incidence, as well as the 
number and positioning of these light sources. The natural 
incident light from sun, moon, Earth, and stars should be 
characterized in terms of both color and intensity on the sur- 
face of the scene/object which is being viewed. For Space 
Station, a light intensity of 100-foot candles at the working 
surface is considered desirable [7]. For limiting glare, polar- 
ized filters can be used for both lights and cameras. The 
modes of lighting can include strobe lighting to eliminate 
motion blurring, infrared/ultraviolet illumination to reduce 
glare, and structured lighting to achieve 3-dimensional robotic 
vision [7]. 

Articulation mechanisms for cameras and lighting should 
incorporate flexibility. These mechanisms would form a 
significant part of the closed-loop control for endeffector 
tracking, stereo camera adjustments, and autonomous opera- 
tions. In the case of stereo vision, the focal length, intercamera 
distance, and inter-camera angle must be automatically 
adjusted to provide optional stereo acuity. In the case of mov- 
ing objects/scenes, the vision articulation should provide the 
feedback to determine the trajectory to a particular point on 
the object/scene. 

Several technology innovations are envisioned for mak- 
ing television’s multi-features highly reliable [13], Solid-state 
cameras, based on charged coupled device (CCD) or charge 
injection device (CID) technology with variable spatial resolu- 
tion, with panels in the order of 1024 x 1024, can feature 
automatic zoom and ability to detect objects as small as 2 mil- 
limeters at a distance of 1 meter. Furthermore, the density of 
the sensing elements can be patterned after the eye with high 
resolution in the middle and low resolution in the peripheral 
vision. Brightness filters, automatic iris, and automatic gain 
control can be incorporated to allow handling of high intensity 
sunlight. Intensities of sun can range up to 13,000-foot can- 
dles necessitating bloom protection, large dynamic range, and 
protection from bum-in [7]. Similarly, sensitive night/dark 
vision modules with nonblooming characteristics are needed. 
In particular, very high quality and resolution imagery is 
needed to meet robotic vision requirements. In certain appli- 
cations, large format cameras capable of precise image men- 
suration at the pixel level may be required. In some applica- 
tions, design features favorable to telerobotic vision applica- 
tions may be incorporated to identify and determine 
aspect/attitude of various objects. Object shapes, color, and 
markings are some of the parameters useful for this purpose. 
Color vision offers an additional capability for the recognition 
of objects. The levels/shades of colors used in robotic vision 
can be substantially higher than can be distinguished by the 
human eye. Its immediate effect on the design of the video 
system is added complexity, data processing, and resultant 
cost Furthermore, the rate of data transmission increases 
significantly. Infrared cameras may be extremely beneficial in 
the location of objects in the dark and occluded areas. 
Although inherently low resolution, infrared imagery can pro- 
vide gross identification of the objects. 
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The accuracy of television-based measurements is 
adversely affected by the presence of the earth, sun, moon, 
and/or stars directly in the field of view (FOV) of the camera 
system. CID video implementations can allow reduction of 
image blooming and removal of the limited area in the FOV 
corresponding to the sun, moon, and/or stars. In the case of 
the extended background provided by Earth, selection of 
appropriate operating optical wavelength for the video system 
can reduce or eliminate this backgrouund radiation. A 
wavelength of 0.94 m provides an attenuation of 21.6 dB due 
to water absorption in the earth's atmosphere [14]. Video sys- 
tems have been developed at JSC in this band for future use in 
space robotic applications [13]. 

Automation in TV operations can be incorporated by 
processing the imagery to determine parameters which are 
needed in the feeback loop. One such parameter is a particu- 
lar object in the scene to be tracked as it moves. Recognition 
of the object, along with position of its centroid as a function 
time, are needed to point the TV to this object. Accurate algo- 
rithms for such parameter estimation in presence of 
rotation/motion and multi-object environment are presently 
under development at many institutions. A complementary 
and independent automation feature in space TV operation can 
use voice control. Such an implementation has been 
developed at JSC [15,13]. This voice control system (VCS) 
allows hands-off control of TV functions including: (1) mon- 
itor selection, (2) camera selection, and (3) pan, tilt, focus, 
iris, zoom, and scene track. Future use of voice has been pro- 
jected to include the EVA astronaut In this application, the 
astronaut can ask for Heads Up Display (HUD) of vital data 
from the Shuttle computers. The data can include system 
parameters, orbital parameters/location, system status, and 
particular subsystem data. The technology innovations for 
future VCS include speaker independence/user-trained, very 
large vocabulary, and isolated and continuous speech recogni- 
tion. 

The need for video data to be able to interface with digi- 
tal processors/computers has given impetus to digital TV tech- 
nology. For the solid state implementations using 
VLSI/VHSIC, recognition/prepro- cessing algorithms can be 
implemented on the same electronics chips making the size of 
these video imagers small. As this technology is rapidly mov- 
ing forward, the need for handling and transmission of high 
data rates is becoming obvious. For a video system at 5 MHz 
baseband, an 8-bit digitization would generate 80 MBPS data 
stream. For color TV implementations and multiple systems, 
this bit rate can multiply significantly. For real-time process- 
ing of this data, compression techniques have been proposed 
which can also be implemented by innovative chip designs. 
The compression algorithms should be automatic and tran- 
sparent to users, not destroy or discard any relevant informa- 
tion, and compress/ decompress data at speeds significantly 
higher than the associated device data transfer speeds. 

Fourier optical processing offers a method of high speed 
parallel processing of data needed to support automation and 
robotics applications (Figure 7). The inherently parallel 
nature of optical processing, coupled with the easy and natural 
optical Fourier transform and the programmable masks, can 
obviate numerical processing for many applications. The 
masks are used to modulate the optical Fourier transform of 


the input scene. An optical retransform then allows direct 
detection of, say, the mathematical correlation between the 
viewed scene/data and the reference image/data. This scheme 
allows correlation or convolution computations in a rapid 
manner; the speed essentially controlled by the recalling of 
computer-memorized data and transfer of the input scene to 
the programmable masks. Texas Instruments, under 
NASA/JSC sponsorship, has fabricated an early prototype of 
this processor. Many improvements in the performance of 
these processors are envisioned for operational use. Designs 
are needed for Deformable Mirror Device (DMD) high spatial 
resolution, accuracy, and reduction of nonlinear effects caused 
by diffraction/scattering. The phase-only nature of these pro- 
cessors results in loss of correlation due to rotation and trans- 
lation of the object/scene. Work is in progress at JSC to incor- 
porate rotation-invariant filtering (directional filtering) for 
compensating the phase-only correlation effects. 

Another rotation-invariant methodology is the use of a 
transformation such as the logarithmic spiral grid for picture 
digitization to make it to correspond to the human eye [16,17]. 
This spatial mapping from a high resolution imager to the 
input modulator in the correlator results in insensitivity to 
scale and rotation of a viewed object. Changes in scale and 
rotation of the input image become displacements in the corre- 
lation plane (Figure 8). Continued development of various 
mappings will result in the design of VLSI cameras whose 
receptor patterns are best suited to drive a subsequent optical 
correlator. 

Another processing technology can be based on neural 
networks. Neural networks are patterned after the human neu- 
rons in the brain and can be termed as the learning networks. 
Hardware implementations of neurocomputers has already 
made it to commercial markets. For example, the TRW Mark 
III has 8100 processing elements and 417,000 interconnec- 
tions. As a coprocessor to the VAX computer, it speeds 
operations by a hundred times. Analog electronic neurons 
have been assembled and connected fof the analysis and 
recognition of acoustical patterns, including speech [18]. Jq 
recognize speech/sounds at the phoneme or diphone level, the 
set of primitives belonging to the phoneme is decoded such 
that only a neuron or nonoverlapping group of neurons fire 
when the sound pattern is present at the input. The output 
from these neurons is fed into a decoder and computer which 
then displays the phonetic representation of the input speech. 
Similarly, neural network architectures/algorithms have been 
proposed to provide anthromorphic framework for analysis 
and synthesis of learning networks for correlation, 
identification, and tracking applications [19]. The current 
technology allows 100-million processing elements along with 
100-million interconnects. By late 1990’s one billion neuron 
networks, with 10 billion interconnects, can be projected as 
technology in this area advances rapidly. Increases of the 
number of neurons on a single chip are projected to ten 
thousand. These analog neuro-network processors can then be 
directly used for video/scene recognition and mensuration. 
Images of space scenes/objects in certain aspects can be 
memorized on the neural networks. Based on this data, new 
aspect object views of the incoming data can be recognized 
using interpolation and extrapolation techniques. 
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For 3-D vision, range information to each pixel can be 
added to the video imagery. Laser scanning devices capable 
of giving angles/position and ranges to each pixel within the 
field of view provide the depth/height profile of the object. 
Included in these measurements can be the laser reflectance of 
the pixel. Two technologies of the solid-state laser vision 
devices currently available are those using mechanical motion 
of mirrored surfaces, and those that involve an inertialess 
change in the optical properties of the transparent medium. 
The latter class includes diffraction of light from an acousti- 
cally generated periodic structure. Phased-arrary solid-state 
scanning devices are also currently under development. These 
devices have the promise of providing fast, accurate, and 
lightweight laser vision. Another application of the laser 
range data can be in the automatic zoom/focus of video sys- 
tems. The laser vision measurements are dependent on the 
intensity of reflected radiation. If coherent radiation is used in 
order to generate an image of the object information from the 
amplitudes, as well as the phases of the scattered radiation, a 
3-D reconstruction of the object can be made. Such devices 
are known as holographic devices. The source of coherent 
radiation can be a solid-state laser that can, in principle, pro- 
vide a resolution of the order of about 1 m. Part of the radi- 
ated beam is deflected toward the detector (Figure 9), where it 
interferes with the backscattered light from the object. The 
hologram can then be generated using known reproduction 
processes, or a 3-dimensional description of the object such as 
a Fast Fourier Transform (FFT). Several applications of holo- 
graphic scanners have been discussed by Sincerbox [20]. 
Some of these are directly applicable to space robotics sys- 
tems. 

Microwave systems have been used to detect relative 
speed of objects and their range in many applications. Their 
use in space robotics applications is being studied at 
NASA/JSC. In particular, millimeter wave radars provide 
attractive performance parameters in addition to their small 
size. The possiblity of broader beam than laser systems makes 
these sensors attractive for initial acquisition of moving 
objects. A radar at the frequency of 100 GHz has been 
developed at NASA/JSC [21]. This particular system is for 
use on a Man Maneuverable Unit (MMU) to provide relative 
range and velocity to the object. The radar is designed to 
operate over the range of speeds from 0.1 to 2.0 fps. This type 
of radar operating at several carrier frequencies can be used to 
measure backscattering coefficients for various polarization 
combinations. These coefficients are object structure depen- 
dent. There is also the possibility of penetration through ther- 
mal protection and obscuration caused by nonmetallic objects. 
This data can be utilized in an interactive manner with that of 
the video systems to provide scene definition/parameters in 
certain situations/scenerios. 

Another promising microwave technique is the time- 
domain imaging, the synthesis of the scattered electromag- 
netic field distribution over an object plane. The transmitted 
pulse is an impulse source offering higher instantaneous 
signal-to-noise ratio, higher resolution, and option for echo 
time gating. Technology advances in the fast pulse generators 
and sampling devices allow the fabrication of time-domain 
imagers in picosecond region with measurements and record- 
ing of both phase and amplitude of the retumed/transmitted 
signal. The scattered signal can be formulated as the 


convolution of the source with the transmitter, receiver, and 
scatterer responses [22]. From a set of time-domain responses 
obtained from different viewing directions, the two- 
dimensional field distribution is synthesized using a technique 
similar to the one in tomography [23]. The resultant image 
closely resembles the object geometry [22]. Thus, time- 
domain impulse imaging is another tool in extracting physical 
information about the object 

Space telerobotic and autonomous robotic operations 
have to be monitored and controlled remotely without the 
availability of hardline power and data services. The 
robots/end effectors must reach small, crowded, or restricted 
space. The communications system on the robot has to be 
able to transmit multiple channels of high-quality video and 
high rate data from other sensors. In most implementations, it 
should be able to receive high-rate data from the control and 
monitoring station. The coverage for the robot/end effector 
should be spherical without blockage/interference from the 
system. Furthermore, the time delays through processors, 
prime power, size, and weight should be minimized. In view 
of these desired performance goals, higher wavelengths would 
be attractive for communication systems. The bands could 
include millimeter wave and optical/infrared. Infrared/Laser 
communications offer unique advantages which are being 
explored at JSC. The design features include multi-access, 
packetized, high-rate, broad-beam links. 

This section has dwelt on a broad set of concepts for sys- 
tem implementations for robotic vision. Active and passive 
sensors in microwave, optical and infrared bands, along with 
high-rate communications systems, are needed for various 
vision applications. Superconductivity devices/systems will 
have a significant impact on the vision systems design and 
performance. Examples of systems benefiting from this tech- 
nology would be: (1) millimeter wave high efficiency distri- 
buted antennas with broadband and large beamwidth perfor- 
mance, (2) microwave power switches, networks, and distri- 
bution circuits resulting in substantial reduction in power loss, 
and increase in bandwidth and sensitivity, (3) development 
of optical and infrared detector cameras for low level/dark 
sensing, and (4) development of programmable signal proces- 
sors, neuro-networks for speech and scene recognition, and 
communications monitoring/ control processors. 

IV. Information Processing Algorithms 

The processing of video data at various information lev- 
els, spanning from the data level to the intelligence level, is 
driven by algorithms . As mentioned earlier, the result of this 
processing is to provide a human operator or a robot with the / 
parameters needed to control a mechanism. In other instances 
the processing leads to a high level description/interpretation - 
of the observed scene for consumption by a human or robotic 7 
supervisor. 

In a given application the set of vision algorithms may be 
grouped into three stages, depicted in Figure 10 and explain/ 
in detail in [24]. These three stages provide a meaningf 
rationale for a CAD-based vision under current developing 
at Rice. 

(1) The first stage is an Image Preprocessing Stage f 
which sends the noisy pixel data into a set of labeled feai 
Typically, there are three types of features, namely, corr 
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vertices, edges , and shaded regions corresponding to surface 
patches on the imaged object In a given application one may 
use any one of these types of features or a combination of 
some or all of them. For example, a common set of features is 
the wireframe Wf of an image / of an object O m . The 
wireframe Wf consists of the set of edges and comers present 
in the image /. The wireframe of the image of a cube is 
shown in Figure 1 la. >From now on, we will assume that the 
output of the IPS is a labeled wireframe. A typical sequence 
of algorithms constituting the IPS would be algorithms for 
Gaussian filtering, Sobel operation, thresholding, median 
filtering, and contour thinning. Other sequences of image pro- 
cessing algorithms can be selected, depending on the type of 
the image data used. The wireframe of a mock-up of the 
Space Shuttle of Figure 12 extracted using these operations is 
shown in Figure 13. 

(2) The second stage is what we call a Symbolic 
transformer (ST) (see Figure 10) which maps the labeled 
wireframe Wf into an attributed graph G(Wf ) . One way of 
converting Wf into G(Wf) is to map each face F; (the 
region enclosed by a mesh in the wireframe) into the node 

of the attributed graph G(Wf) and the edge defining the boun- 
dary between two adjoining faces, say F t and Fj , into the 
link ly of the graph. Referring to Figure 11, the graph G(Wf) 
corresponding to the wireframe Wf of the visible surface of 
the cube of Figure 11(a) is mapped into the subgraph within 
the dotted curve (consisting of the nodes Nj , N 2 , and A / 3 
and the links joining these nodes to each other) of the graph of 
Figure 11(b). The whole graph of this figure corresponds to 
the wireframe of the entire surface (visible and occluded) of 
the cube. In such a symbolic representation, we assign an 
attribute (feature) vector to each node and each link in the 
graph. Thus, let an m-vector I 1 = col (/{, ..., V m ) represent a 
set of attributes (features) associated with the face F L which 
are invariant under 3D translations, scalings, and rotations. 
Examples of such I 1 are the sets of numbers expressing the 
Gaussian curvature or. mean curvature of F t . We call the 
attributed graph G (T|) obtained from G (T|) by assigning to 
the nodes A',-, t = 1 , ..., n, respectively, the attribute vec- 
tors /* , the FI AG (Feature-In variants/ Attributed-Graph) [25] 
representation of the object V[ . A special case of this 
representation is the MIAG (Moment- In variants/ Attributed- 
Graph) representation of polyhedral objects proposed in [26], 
in which /* constitutes a set of 2D moment invariants of F, 
(these being invariant with regard to 3D translations, scalings, 
and rotations). 

(3) The third and final stage (see Figure 10) is a set of 
High-Level-Processors (HLP’s) that map the attributed graph 
into appropriate symbols or vectors giving the information 
that the vision system is required to provide to the human 
operator or robot. Thus, in Figure 10, HLP 1 identifies the 
object being viewed as being the object O m of the objects 
present in the computer library. In other words, HLP1 is 
embodied by an identification algorithm. HLP 2 and HLP 3 
constitute implementations of algorithms for determinant of 
the position and orientation of the object respectively, etc. 

Having outlined the general framework for the various 
algorithms constituting the vision system, we now focus atten- 
tion on some of the algorithms making up the indiv ; 
HLP’s mentioned above. 


(a) Object Identification! Recognition. The Moment- 
Invariants/Attributed-Graph (MIAG) algorithm [26] for recog- 
nition of 3D objects from a single picture has been success- 
fully developed and tested [27]. The algorithm works for 
polyhedral objects, and its generalization for nonpolyhedral 
objects has been indicated [25]. Each face of a polyhedron 
can be considered to be a rigid planar patch (RPP). Motion of 
the object can be considered to be motion of its constituent 
RPP’s. In the case of parallel projection, if an RPP undergoes 
rigid body motion in 3D, its image undergoes affine transfor- 
mations. So the method which tries to identify an object in 
3D motion should use features of images which remain invari- 
ant under affine transformations. General moment invariants 
introduced in [26] are such features. These are invariants of 
2D (rigid planar patch) moments which remain invariant 
under 3D translations, rotations, and scalings. Identification 
of an object is achieved by matching the attributed graph of its 
image (see Fig. 11(a)) to a subgraph of one of the graphs 
corresponding to the models stored in the computer library. 
The algorithm matches a pair of nodes by comparing the 
Euclidian distance between their feature vectors. Thus, if 
/ = (/i , 1 2 » ^3 > h ) is the feature vector of a node consist- 
ing of four moment invariants of its corresponding face, and 
I' = (I x t I 2 , I 3 , / 4 ) the feature vector of the node to which 
it is being matched, the distance between them is taken to be 

d = VPiUi - A) 2 + Hh - h) 2 + p 3 (/ 3 - / 3 ) 2 + UU - /i) 2 0) 

where p,’s are appropriate weighting factors. The driver 
algorithm arbitrarily picks a node Nj in the image attributed 
graph; then it looks for a node Oj in the model graph with the 
same feature vector. If matched, these nodes are marked as a 
pair, and an adequate node in the image graph is chosen, and 
the nodes adjacent to 0 } are scanned to see if it matches one 
of them. In practice, after a few node matchings, a unique 
identification is achieved. 

(b) Motion Parameter Estimation. Using appropriate 
camera calibration, all the motion parameters (position, velo- 
city, attitude, and attitude rate) except for a scaling factor, can 
be determined by means of a single high precision camera. 
For this purpose, there are basically two model-based methods 
available: One, based on the contraction of the moment ten- 
sors of a surface patch of the model and its image, determines 
the attitudes vector 0 (raw, pitch, and yaw) and attitude velo- 
city 0 . (See [25] and [26] for details.) The other, based on 

the correspondence (assumed known) of eight points on the 
image / and eight points on the model rj* (assumed located 
and oriented in a standard position), yields all the motion 
parameters except for a scaling parmeter. This second method 
has been extensively discussed by Longuet-Higgins [28], Tsai 
and Huang [29], Haralick, et al. [30], and others, and recently 
extended to the case in which both the object and camera are 
moving by Fotedar, et al. [31]. 

(c) CAD-Based Vision. What we have described above 
constitutes a framework for CAD-based vision (see [24] for 
details). The current CAD-based systems are driven by 3D 
geometric modeling procedures originally developed for the 
representation and manipulation of objects in a design or com- 
puter graphics environment [32-34]. The system under 
development at Rice, based on the representations described 
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above, will be fully compatible with the requirements of a 
vision system. 

There are three other areas of algorithmic development 
related to vision which are of special interest and the work on 
which is being actively pursued. These are described below. 

(1) Shape Extraction Based on Illumination . As pointed 
out earlier, the fact that near vacuum prevails in space 
scenarios makes the scattering of light by a surface strongly 
dependent on the surface properties. Thus, by using appropri- 
ate mathematical models for the surface and for the 3D illumi- 
nation conditions, it is possible to design algorithms that pre- 
cisely determine the shape of the surface by the shadowing 
caused by illumination as well as any changes that have 
occurred on the surface conditions. 

We note that a variety of methods have been developed 
for extracting shape based on camera data, each working 
under a different set of conditions and using different clues for 
reconstructing the surface. Thus, Stereo Vision uses the 
disparity between two images of the same object from two 
cameras to reconstruct the surface. Methods based on Struc- 
tured Lighting project a known pattern of light on the surface 
and reconstruct the surface by looking at the distortion of this 
pattern. Shape from Shading is based on the premise that sur- 
faces reflect different light intensities depending on the rela- 
tive orientation of the surface to the light source and the 
observer. In principle, knowing the form of this dependence 
and the amount of light actually reflected back, the surface 
orientation can be calculated (see [35] and references therein). 

Two methods have emerged for extracting shape from 
shading. One is called the proper " Shape from Shading " 
method while the other is termed " Photometric Stereo." 
Classical shape from shading techniques reconstruct the object 
from a single photograph of the object when the light source is 
placed in a known position. Photometric stereo involves 
reconstructing the object from multiple images of the object 
taken by moving the light source to different positions, while 
the position of the camera remains fixed. Some of the advan- 
tages that these methods offer are high resolution surface 
reconstruction as typically needed on assembly line opera- 
tions. Handling of new parts, object testing for tolerance, 
estimating and repairing structural damage are some tasks 
which need a high resolution surface reconstruction front end. 
Another possiblity is the design of a front end visual system 
for feeding in surface models of real world objects to a CAD 
system. 

Stereo vision and structured lighting methods have an 
inherent matching problem (used in generating the disparity 
map and the line matching) that is as yet unsolved. This prob- 
lem is absent in shape from shading methods. Furthermore, in 
comparison with the structured lighting methods, shape from 
shading methods offer the advantage that the whole surface of 
interest is imaged. No shadows are willfully projected on it. 

In shape from shading algorithms, the characteristic strip 
expansion methods [35] have several shortcomings, including 
sensitivity to measurement noise and a tendency of adjacent 
characteristic strips to cross over each other, due to accumula- 
tion of small numerical errors. Finally, the procedure is not 
amenable to implementation in parallel form. The variational 
method [35] that uses an object’s occluding boundaries as 


cues to the recovery of its shape from shading alleviates these 
limitations. The blending of concepts from variational cal- 
culus with those from the best approximation theory can lead 
to spline-based solutions for the gradient functions determin- 
ing the local surface shape orientation, as obtained in [36]. 
Research is also under way at Rice [37] to investigate certain 
aspects of Photometric Stereo such as completeness of illumi- 
nation, optimal light placement, and robustness with respect to 
noise. 

(2) Shape Extraction from Sparse Range Data. Our 
second set of algorithms for the extraction of shape of 3D 
objects are the ones based on sparse range data. A new 
methodology for surface reconstruction from such data was 
recently developed by Kehtamavaz, et al. [38]. Such a recon- 
struction is formulated in terms of three separate subprob- 
lems: (i) 3D contour segmentation, (ii) segment matching, 
and (iii) surface patch formation. This framework is based on 
a syntactic! semantic criterion which incorporates the shapes 
of the contours in creating the surface. First, the contours are 
divided into sets of 3D curve segments in order to distinguish 
local shapes or substructures in the contours. Then, the curve 
segments are found on adjoining contours with similar shape 
characteristics. Finally, parametric surface patches are formed 
between the matched pairs of curve segments on adjoining 
contours by appropriately blending them. Typical reconstruc- 
tion obtained by these results are illustrated in Figures 14 and 
15. 

(3) Shape Extraction by Sensor Fusion . Although radar 
scattering cross-sections alone cannot provide a complete 
description of a scattering surface, they are very useful when 
used to complement optical images, providing information in 
those regions of the object where a camera is blind due to 
phenomena like specular reflection. 

The specular point on a curved surface is a point at 
which the angle of reflectance equals the angle of incidence. 
In traditional shading models, highlights occur at these 
points. In space, these highlights are so disproportionately 
intense that they tend to obscure the surrounding details of the 
surface. This phenomenon can be traced to two causes: the 
Airy disk , and blooming. 

The Airy disk , or ring, is an optical term for the first dif- 
fraction fringe surrounding the image of a point source 
transmitted through an aperture. Because lenses constitute 
finite apertures, these rings are present to some degree in all 
imaging systems [39]. Usually, when dealing with incoherent 
light emanating from curved objects, the Airy disks of a distri- 
bution of point sources tend to cancel each other out and are 
not visible in the resulting image. However, intense specular 
reflections become point sources which are orders of magni- 
tude more intense than the surrounding reflections. The 
resulting diffraction fringe is highly visible and can wipe out 
the shading information in adjacent areas of the image. 

Blooming occurs at points of high intensity in a televi- 
sion image. If the distribution of intensities is relatively even, 
this phenomenon is not a problem. If specular points occur 
whose intensities contrast sharply with their surroundings, the 
effect is noticeable. The anomalously high grid voltage in the 
camera cathode ray tube causes the electron beam to spread. 
The result is specular points which are smeared over several 
pixels. Blooming can obscure small features surrounding 


447 


specular points in the image of a highly reflective surface. 

Space images also suffer from indistinct edges . Because 
of the complete lack of illumination on the shadow side of 
space objects, their edges are invisible against the dark back- 
ground. This can result in false edges for curved objects, or 
the absence of one or more edges on polyhedrons. 

The sensor fusion algorithm under development at Rice 
[40J reconstructs 3D space objects (whose images may be 
degraded as described above) given observations taken by a 
microwave radar system from a solitary remote point Since 
microwave or millimeter-wave radar systems are currently 
found on a variety of space vehicles, radar scattering informa- 
tion would seem to be a logical addition to a space robot’s 
sensory data. The unknown portion of the scattering surface is 
parametrically approximated with splines so that the 
microwave scattering equations can be used to derive the unk- 
nown surface. In this way the radar cross-sections are used to 
reconstruct those portions of optical images which are des- 
troyed by high intensity specular reflections (see Figure 1). 
Edges which are lost in shadows can be inserted in a similar 
fashion. The solution procedure is an iterative non-linear 
least squares algorithm [41], using the incomplete optical sur- 
face to provide a first approximation to the actual surface 
parameters. A surface model is generated at each step in the 
algorithm and approximations to the co- and cross-polarized 
scattering cross-sections are computed from this model. A 
Physical-Optics approximation to the Jacobian is then used to 
update the unknown surface parameters for the next iteration. 
When the best possible surface is obtained, the least-squares 
algorithm is terminated and the new surface, with the 
degraded portions filled in, may be passed back to the optical 
shape-from-intensity algorithm for further refinement. Thus, 
we are fusing the optical image sensors with polarized 
microwave radar cross-sections to arrive at a target object 
characterization which is more complete than either of those 
derived from the image or radar separately. 

V. Proposed Future Developments 

As was mentioned earlier, the interaction of natural light 
with the objects in space has to be accounted for in the vision 
algorithms. Furthermore, artificial light(s) arrangements have 
to be developed which can provide structured (known distri- 
bution) light across the object The pronounced shadows and 
specular points due to the vacuum and smooth parts of the 
object, provide a large dynamic range of the 
reflected/scattered signal. The intensity changes can be in the 
10 range. The addition of artificial illumination provides the 
opportunity to control intensity, wavelength, polarization, and 
orientation with attendant advantages of increased recogni- 
tion. Additionally, color will be another discriminant 
involved in the recognition. Analytical studies in these areas 
should lead to the design of illumination systems for space 
applications. 

The use of laser vision and microwave scattering instru- 
ments creates another area of future development. Fast 
scanning/holographic lasers provide a depth perception of 
objects, which is quite complex. This depth data can be util- 
ized to iteratively provide a 3-D image of the object by 
weighting video-aquired image data appropriately. These 
weights will depend on the surface curvatures as they project 


in the incidence direction of the laser beam. Both empirical 
and analytical studies are needed. The microwave back- 
scattering can provide another independent set of data. The 
shape of certain objects can be directly deduced from this 
data. In many inspection tasks in which a nonmetallic shield- 
ing has obscured the view, such sensing will be mandatory. In 
other situations, the microwave data can be iteratively used 
with that of TV to arrive at more definitive description of the 
object. At the expense of complexity, doppler processing of 
microwave and laser data can be used to discriminate moving 
parts of a distant object. The advantage of such a vision is 
that it is independent of sunlight and it provides a direct meas- 
ure of range and relative velocity of various parts of the 
object. 

Another area of endeavor should be time-domain imag- 
ing. A sharp pulse transmitted yields a unique description of 
the object. This time domain reflectometery is evolving 
rapidly. Another mode of the system can utilize reflectivity 
data in the near-field. These techniques have not been 
explored for the robotic vision applications. 

Finally, further research and development is needed in 
the area of multisensor coordination and fusion. The recogni- 
tion algorithms are to be extended to include interrelating data 
from several cameras, laser scanners/holographic systems, and 
microwave sensors. These algorithms should include motion, 
rotation, and object changes as functions of space and time. 
In addition to this, "environmental" data pertaining to the 
events/objects and their status, has to be included. These 
aspects, along with rational models, incorporate expert and 
artificial intelligence techniques in the scene analysis. The 
goal should be a multisensor, multimode vision system capa- 
ble of autonomous operation and self-calibration. 

VI* Conclusions 

This paper is aimed at providing a review of some of the 
efforts in progress at NASA/JSC and Rice University. The 
design and development of a vision systems for space applica- 
tions needs several considerations which make them different 
compared to those used in ground applications. The concerns 
for space unique vision systems have been elaborated. 
Several efforts which need to be undertaken have been dis- 
cussed. Considerable work has to be accomplished in order to 
provide robust, lightweight, small size, and autonomous vision 
systems for specific space applications. 
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FIGURE 1. CONCEPTUAL SPACE STATION ROBOTICS/AUTOMATION 
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FIGURE 4. 

CAPTURE STOPPING DISTANCES AND VELOCITIES FOR 
VARIOUS FORCES 


FIGURE 2. 

FUNCTIONAL ELEMENTS OF TELEOPERATOR/AUTONOMOUS SYSTEM 


SERVICE ON SPACE STATION CONCEPT 




FREE FLYER SERVICE CONCEPT 




FIGURE 5. NASA/JSC EVA RETRIEVER VISION SYSTEM CONCEPT 
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ILLUMINATION 


FIGURE 6. 


FIGURE 3. ROBOTICS SERVICING CONCEPTS 


SPACE ILLUMINATION FOR SHAPE DEFINITION/IDENTIFICATION 
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PARAMETER 

RAN6E (R) 

RANGE RATE 
POINTING 
BEARING ANGLE 
BEARING ANGLE RATE 
ATTITUDE <P,Y) 

ATTITUDE (R) 

ATTITUDE RATE 

R, R OUTPUT DATA RATE 

ANGLE OUTPUT DATA RATE 


LIMITS 

0-1 km (3230 ft) 

+3 m/s (+ IQ ft/s) 
+ff/2 sad (+90*) 

+.2 RAD (+10*) 

+20 mrad/s (+l # /s) 
+.5 RAD (+2S") 

+ TTRAD (+180") 

+20 MRAO/S (+l'/s) 
1 Hz 
3.125 Hz 


ACCURACY (la) 

•01 R; 2.5 mm < 10 m 
.0001 R/S; 3mm/ s i30m 


3 MRAD (.2’) 

,03 mrad/s (,002Vs) 
7 MRAD (.3*) 

7 MRAD (.3") 

.03 MRAD/S (.002 Vs), 


AT 

R < 100 


FT 


TABLE L LASER DOCKING SYSTEM PERFORMANCE GOALS 



AND INTEGRATION 





ROTATIONAL 

INVARIANCE 


MAGNIFICATION 

INVARIANCE 


FIGURE 7. 

OPTICAL PROCESSING FOR CONTROL APPLICATIONS 


FIGURE 8. 

COORDINATE TRANSFORMATION/MAPPING FOR ROBOTIC VISION 


OETECTOR 



FIGURE 9. 

CONCEPTUAL SCHEME FOR 3-D IMAGE PROCESSING BASED 
ON HOLOGRAPHY 



TESTBED FOR SHAPE FROM SHADING ALGORITHM VERIFICATO 
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FIGURE 10. THE THREE STAGES OF THE VISION SYSTEM DESCRIBED IN 

THE TEXT 



(a) Object 



FIGURE II. THE WIREFRAME OF A CUBE AND ITS ATTRIBUTED 
GRAPH REPRESENTATIVE 
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FIGURE 12. 

LABORATORY MODELS OF THE SPACE SHUTTLE AND PARf OF 
THE SPACE STATION 






EDGES OF A TYPICAL SET OF LEFT VENTRICULAR PET 



FIGURE 14. 



(POSITRON-EMISSION-TOMOGRAPHY) SLICE AND THEIR 
RECONSTRUCTION BASED ON CARDINAL SPLINE 


FIGURE 16. SURFACE RECONSTRUCTION BASED ON SENSOR 
FUSION (SURFACE A IS TO BE RECONSTRUCTED FROM 
IMAGE DATA WHILE SURFACE B FROM RADAR DATA) 


BLENDING FUNCTIONS 
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